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The article proposes a model for the configuration management of open systems. The model 
aims at validation of configurations against given specifications. An extension of decision 
graphs is proposed to express specifications. The proposed model can be used by software 
developers to validate their own configurations across different versions of the components, 
or to validate configurations that include components by third parties. The model can also 
be used by end-users to validate compatibility among different configurations of the same 
application. The proposed model is first discussed in some application scenarios and then 
formally defined. Moreover, a type discipline is given to formally define validation of a 
configuration against a system specification. 

1 Introduction 

Defining a software product as a system of components is common both in theory and practice of Soft- 
ware Engineering [ELH+OS] . Many models and standards jlSOOSi IIEE05| of the software development 
process consider configuration management as the set of activities for identifying, controlling and man- 
aging software components during the life-cycle of the product, in particular at project milestones when 
baselines and releases have to be arranged and frozen. 

The wide availability of open formats and the large success of open source projects make configuration 
management a constantly growing issue. Many developers have the possibility to create add-ons or even 
to modify the core components of the application to originate a different one. On the user side, Internet 
distribution allows everyone to choose among available components and to download them to extend and 
update the applications. 

As a result, when open systems are involved, identification of components, selection of their right 
version, control over their composition, i.e. configuration management, does no longer belong to the 
closed environment of a single software house, but to a wide context participated by many developers 
and where the end user is an active agent (seriously concerned by the reliability of the results). 
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In this paper we propose a model for the configuration management of open system. The model aims 
at the validation of configurations with respect to given specifications. Specifications may be defined 
by the original software developer to validate its own configurations across different versions of the 
components or to validate configurations that include components by third parties. Specifications may 
even be defined by users themselves to validate compatibility among different configurations of the same 
application. 

The proposed model is first discussed in some application scenarios and then formally defined. More- 
over, a type discipline jCar96j is given to formally define validation of a configuration against a system 
specification. 

2 Scenarios and Basic Needs 

There are many examples of applications in which a end user is involved in configuration management. 
The scenarios presented in the following help to identify different needs and to define the basic operations 
of configuration management that our model aims to support. 

2.1 Scenarios 

Internet browsers are a simple and common example of end user configuration management. Browsers 
are characterized by frequent updates, often needed for security reasons, and by many plug-ins for media 
visualization. The large majority of users passively perform the installation of all suggested updates and 
add-ons, but sometimes someone has (or wants) to decide what she/he likes to install. 

A more complex and interesting example is given by software development environments. Such tools 
usually offer support for different compilers, libraries, debuggers, editors, modelling and code generation 
tools, and so on. Eclipse and Cygwin are two good examples, but there are many others. In these cases, 
the user is also concerned by compatibility among configurations in a given context. For instance, a 
programmer wants to install a new add-on to experiment with a new UML modeling tool, but also needs 
to check compatibility of the wished upgrade with all the installations of the colleagues working in the 
same software project. 

On a greater scale, Linux distributions are probably the most complex case of large software system 
configuration management. A distribution offers a kernel version, a number of device drivers, a choice 
of system tools, several software utilities, and a number of applications. All of these software artifacts 
have dependencies that must be respected. The user willing to install a new application on top of 
the basic distribution is involved in a process that may lead to installing other artifacts, like libraries, 
data sets and so on, that can conflict or be incompatible with the previously installed ones. The usual 
solution is to provide official repositories which collect certified compatible versions of the most common 
applications (repositories are valuable assets in the success of distributions like Ubuntu, Fedora and the 
others). However, repositories are never complete and usually they apply a very simple policy: provide 
just the last stable versions. This approach does not support the needs of local groups of users, like 
software developers, or in general smart users of applications. Such people need to define and control 
custom configurations of their working environments. 

An other interesting example is given by PC games. The typical game architecture is based on a game 
engine able to interpret a number of contents (maps, 3D models, AI scripts, ...). The engine may be 
closed, but, even for proprietary games, contents often have open formats and third parties, as well as 
users themselves, may develop new mods, namely modifications of contents that provide new levels, new 
game situations and so on. Players are often organized in communities, less or more independent from 
the game developer, that build and share mods of their favorite game |Cig01| . Technically speaking, 
a mod is a configuration made from some parts of the original game and a number of new ones. In 
this scenario, configuration control is not only a matter of reliability (the game has to run) but also a 
requirement to guarantee fair play. For instance, an easy way for a first person shooter player to cheat is 
to mod its local configuration to give brilliant textures to the avatars of the other network players. This 
is another case where checking the compatibility among configurations is needed. 
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2.2 Basic Operations 

There are many examples of automated procedures to install, configure and update the applications. 
Configuration control of commercial closed applications has the advantage of complete control both of 
the software architecture and of the development process. In contexts where applications are open, the 
number of agents involved in software development and in the management of its configurations largely 
increases: a shared model to describe and to manage configurations is needed. 

In common speech there is not difference in the usage of terms like a "update" , "extension" , "version" 
and so on. Our model describes such operations in terms of operations on components (that, at the end, 
are files or set of files). The model has three basic operations: 

• update, that is the substitution of components of the current configuration with more recent versions 
of them; 

• extension, that is the addition to the current configuration of new components aimed to increase 
the functionality of the software system; 

• compatibility check, that is the examination of a configuration in order to verify a specification 
derived by another configuration. 

An installed configuration is just a set of components. Update and extension are operations on the 
set of components. In practice, such operations are easily recognizable in the previously introduced 
scenarios: a plug-in for movie reproduction can, at some time, be added (extension) to a web browser 
and, some time later, substituted with a new version (update); the same happens for a library in a 
development environment or a dungeon map in a game. 

Update and extension operations modify the installed configuration. The life of the installation of a 
software system evolves in steps that always move from a consistent configuration to another consistent 
one. Of course, update and extension must be reversible. Consistency is obtained by the respect of 
a set of constraints, i.e the configuration specification. The specification is defined by the architecture 
designer, but it must leave enough freedom to allow the variability requested by other developers and 
end users. 

Compatibility check is an operation usually not identified as part of configuration management (often 
not identified at all). The most similar implementation of the concepts are the built-in anticheat features 
implemented in many proprietary game platforms (like, for instance. Steam), but these are usually 
very restrictive and tied to the official game configurations. Scenarios where great openess and strict 
compatibility control must cohesist, like the software development team, are uncovered by these kind of 
solutions. 

For compatibility check, additional constraints are defined by users which want to define the set 
of the configurations that fulfills their specific needs, that generally, is a subset of the whole set of 
consistent configurations. The best way of defining such a set is by deriving the specification from a 
target configuration. For instance, in a software development team the target compatibility configuration 
is the baseline tool-set or, in the network game scenario, the target compatibility configuration of the 
master player that sets the rules of the deathmatch. 

3 Specification and Representation of Configurations 

In general, a software system, for instance one of the applications we use everyday, is perceived as 
something identified and unique. In practice a software system is a set of sets. This is the set of all 
working variations that can be built by combining the various versions of the components that belong 
to the software system or, more in general, to its runtime environment. 

Software system. A set of configurations, each one of them is a set of interacting components. 

Intuitively, a component is a unit that may be separately distributed and installed by the end user 
[PDCOlj . In practice, a component is made by one or more files like binaries, libraries, data sets in 
various formats, and so on. In our model, the concept of component type is introduced to define the 
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characteristics of a component of a software system. At the maximum level of detail, each file must 
be considered as a component. Our model permits several levels of details thus complying with those 
frequent situations where, for practical reasons, it is convenient to see components as set of several files 
that are always distributed togheter. 

Configuration. A set of components that complies to the configuration specification of the software 
system. 

Specification of configurations of software systems is a well known problem. A classical approach uses 
decisions graphs to represent the possible choices in the building of a configuration starting from the set 
of components. The best known example is given by and/or graphs [CW98]. Our model extends this 
approach to systems where the set of components is not strictly defined from the beginning. This complies 
with open development, where the evolution of the set of components cannot be strictly controlled. 

The model introduces configuration specification graphs to specify and represent the set of valid com- 
ponents types for a software system and their composition rules to build the valid configurations of the 
software system. 

Component Specification Node (CSN). A component type that, in a configuration, can be instan- 
tiated into one or more components. Every CSN contains information on the described components, 
such as their names (nm), origins (org) and version numbers (ver). Constraints can be imposed on such 
information such as restrictions (e.g. the name must start with a certain prefix p - denoted p*) or 
consistency requirements (e.g. two componets must have the same version number). Every CSN (leafs 
excluded) has an associated interval that bounds the number of components that can instantiate the set 
of its CSN successors. The root node is the component type that corresponds to the complete software 
system that can be instantiated into one of its valid configurations. 

Composition Specification Arc (CSA). A directed arc that connects a CSN to its CSN successors 
that specify the types that may instantiate it in a valid configuration of the software system. Every 
CSA has an associated interval that bounds the number of components that may instantiate the pointed 
CSN sucessor. For every interval [a,b] associated to a CSN and for all intervals [ai,bi] associated to its 
i successor CSAs, it holds ^ o- ^ b < J^^i- 

Dependence Specification Arc (DSA) a directed arc that connects a CSN to an other CSN and 
expresses the possibility that, in a configuration, an instance of the former need the presence of an 
instance of the latter. 

Configuration Specification Graph. A directed graph with a set of CSNs, a set of CSAs and a set 
of DSAs. If only CSNs and CSAs are considered, the CSC is a tree. 

Figure l| shows the CSC of a very simple software system. The used nototion is borrowed from UML 
[UMLlOj class diagrams. The system is Psycho, an hypothetical application made by the IMsk software 
company that reproduces audio files while visualizing programmable psychedelic effects on the video. To 
produce the effects the application interprets a simple script language that calls graphic primitives from 
libraries built under respect of given APIs. For smart users, a large part of the fun in using Psycho is 
programming scripts and sharing them with friends. Geeks enjoy coding new libraries featuring more 
sophisticated graphic primitives. 

The root CSN identifies the software system. Psycho, as a type that can be instantiated in one of 
its valid configurations. Each Psycho configuration is made by two or more components. The leftmost 
component type Bin, which is the core part of the application, is made by exactly three components 
types: App, MLib and GLib. Each of these component types, in this example, can be instantiated by just 
one component. We can imagine them as actual files: an executable and two dynamic libraries, one for 
audio files reproduction and the other for the default set of graphic effects. For the Bin component this 
CSG is quite strict: the intervals associated to CSNs and CSAs imply a fixed number of components, 
the specification of attributes implies given identifiers and there are constraints on version numbers and 
origin of the components. 
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Figure 1: A Configuration Specification Graph for a simple application 



In particular, the version number of libraries must be greater than the version of the main executable, 
as IMsk allows patches of libraries but, if the core application is modified, it must be released with 
aligned versions of libraries. Moreover, this CSG implies that the Bin components have to be released 
only by IMsk, as the origin constraint specifies. In this example to specify the origin we used simple 
labels, however, in the implementation of the model origin is better specifid by unique identifiers, as for 
instance URLs, that both provide uniqueness and ability to directly refer a distribution repository. 

These strict rules apply only to the Psycho software system. Depending on the software license, it will 
be possible for others to define a new CSG with weaker constraints. It could specify another software 
system that is not Psycho, albeit tied to the original Psycho and sharing with it many components. This 
is just the kind of control that open source projects need when they approach the twilight zone where 
configurations mix with project forks. 

On its right side, the Psycho CSG is more open to extensions: scripts and custom graphic libraries can 
added by everyone. The only constraints are that at least one script must exists (e.g. the one shipped 
with the original distribution by IMsk) and that version numbers of scripts and custom libraries must 
be equal or lower than the version number of the application. 

In [Figure l| are shown, as dotted arrows, two DSAs. The first is between PScr and CGLib, as scripts 
may depend on graphic primitives supplied by custom libraries; the second DSA is between PScr and 
PScr itself, as the script language allows script calls. DSAs are useful to express dependencies in the 
variable part of a configuration. Of course scripts depend on the application, but being the Bin subtree 
already specified as mandatory in all of its components, it is not necessary to express dependencies. Note 
that the CSG simply allows the possibility for such dependences, they actually depend on configurations 
(see [Figure 2| and [Figure 3| . 

A CSG specifies the valid configurations of a software system. A single configuration is described by a 
Configuration Graph (CG). [Figure 2| and |Figure 3| show two examples of CGs that describe two different 
configurations of the Psycho software system. Again, the notation is borrowed from UML: a CG is an 
object diagram. CGs and CSGs have a similar structure, but CG nodes are objects, instances of the 
CSG nodes, which are classes. Moreover, all composition relations in a CG are 1 to 1: where in a CSG 
there is a single node that can be instantiated in a variable number of components, in the CG all the 
components are represented. 

The CG of [Figure~2| represents a very basic installation of Psycho, that is referred as psyl. It has only 
the default script (def.psc) and no additional graphic library. All components are originated by IMsk 
and are marked as version 1. 

The use of identifiers in CGs is worth a note. They are introduced to easily refer a CG, a subtree 
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Figure 2: A configuration compliant with the CSG of |Figure"T| 
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Figure 3: Another configuration comphant with the CSG of [Figure 1| and compatible with the configu- 
ration of [Figure^ (not the conversely). 

of a CG (like hinl in 'psyl) or a node of a CG. However, it is important to higlight that identifiers of 
configurations are the CG themselves. More in details: for each node that is root of a CG subtree, 
its actual identifier is the subtree. The identifiers of nodes that are leaves can be easily and usefully 
associated to file names, but the version number is also needed to actually identify an installed leaf 
component. 

The CG of [Figure 3| shows an upgraded and customized installation: "psyZ. Versions are more recent 
and a new script is added (my.psc). Because such script needs a custom graphical library (julib.so, 
that provides functions to generate shapes inspired by the Julia set), the library is installed too. Thus, 
the DSA between my.psc and julib.so represents a true dependence between two actual components - a 
dependence that was "foreseen" in the CSG. 

Besides the UML arc notation, the dependence information belongs to the dependent component: the 
description of my.psc tells the installer about the dependence and, if julib.so is not already present, it 
calls its installation. The same information can be used to prevent disinstallation of julib.so if my.psc is 
still present in the configuration. 

The two CGs of [Figure 2| and [Figure 3] show also an example of compatibility among configurations. 
Being 'psy2 an updated superset of psyJ, it is compatible with 'psyl, that is the user of 'psy2 can enjoy all 
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the psychedelic effects who see the user of psyl. However, psyl is not compatible with psy2, that is the 
user of psyl can't see the Julia effects. 

4 Formal Definition 

In this section we give a formal definition of configurations and of configuration specifications, followed 
by a formalization (by means of a type discipline - see [Car96| for a survey on type systems) of the 
procedure for checking the compliance of a configuration with respect to a given specification. 

4.1 Configurations and specifications 

A software component can be uniquely identified by its name, origin (i.e. manufacturer) and version 
number. Moreover, the type associated with the component in a software system describes unambiguously 
the role of such a component in the system itself. 

Let Types be the finite set of all the types of component of a software system. Moreover, let Names, 
Origins and Versions be the (possibly inifinite) sets of all possible software component names, origins 
and versions, respectively. 

Definition 1 (Component identifier) A component identifier is a tuple {t, n, o, v), where t e Types, n G 
Names, o G Origins and v G Versions. We denote with CI the set of all component identifiers, namely 
CI = Types x Names x Origins x Versions. 

A component identifier refers unanbiguously to a component of a specific software system. In many 
cases, however, we shall need to refer to a set of components of the same type. For instance, a Component 
Specification Node, namely a node of a Configuration Specification Graph, may actually be instantiated 
in different configurations with different components of the same type. In order to describe a CSN we 
shall need an identifier (called abstract component identifier) for sets of components of the same type. 

Definition 2 (Abstract component identifier) An abstract component identifier is a tuple {t, N, O, V), 
where t G Types, N C Names, O C Origins and V C Versions. We denote with ACI the set of all 
abstract component identifiers, namely ACI = Types x p{Names) x p{Origins) x p{Versions). 

Let Elements be the (possibly infinite) set of basic constituent elements that may be included the 
software architecture (e.g. files). We now formally define the notion of component and of configuration 
of a software system. 

Definition 3 (Component) A component is a tuple having one of the following two forms: 

• {ci,D,E) with E C Elements and finite; 

• {ci, D,CI) with C/ C CI and finite; 

where ci G CI is the identifier of the component, D C CI is the finite set of components upon which 
component ci depends. 

Given a component c such that either c = c' = {ci,D,E) or c = c" = {ci, D,CI), let ci(c) = ci, 
dependencies{c) = D, children{c') = and children{c") = CI. Moreover, given a set of components 
C, let cis{C) be the extension of ci{c) to sets of component specifications defined in the obvious way. 

Note that it is not reasonable in a software system to explicitly specify a dependence of a component 
from the subcomponents constituting it (since it is an obvious dependence). Hence, for a given component 
c we will assume dependencies{c) fl children(c) = 0. 

Definition 4 (Configuration) A configuration is a set C of components such that: 

1. \/c E C it holds children{c) C cis{C); 

2. \/c C it holds dependencies{c) C cis{C); 
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3. 3\c e C s.t. Vc' eC it holds ci{c) ^ children{c'); 

4- Vc E C either flc' s.t. ci{c) £ children(c') or 3\c' s.t. ci{c) € children(c') 

A configuration is a set that actually represents a tree. In fact, each component of a configuration 
is either a leaf of the tree (if it has the form {ci,D,E)) or an intermediate node (if it has the form 
{ci, D,CI)) with components in C/ as children. The conditions in the definition of configurations ensure 
that a configuration is a well-defined tree, namely that all nodes are present in the configuration and 
that there is only one root. 

It is easy to see that this definition of configuration corresponds to the notion of configuration proposed 
in [section 3l with the sets of dependencies representing Dependence Specification Arcs. For instance, 
the configuration in [Figure 2| can be represented as follows. We have six components represented by the 
following component identifiers: 

cijpsyi = {Psycho, psyl, IMsk,l) 
ci-bini = {Bin,binl, IMsk,l) 
ci-def.psci = {PScr,def.psc,IAIsk,l) 

cijpsychoi — (App, psycho, I M sk, 1) 
cijmlib.soi — (M Lib, mlib.so, I M sk, 1) 
ci-glib.soi — (Glib, glib. so, I Alsk, I) 

that are used in the following representation of the configuration: 

Cpsyi = { {ci-psyi, , {ciJbini, ci_def.psci}) , 

{ci-bini, 0, {ci-psychoi, cijmlib.soi, ci-glib.soi}) , 
{ci-def.psci, 0, def.psc), 
{cijpsychoi , , psycho) , 
{cijmlib.soi, 0, mlib.so), 
{ci-glib.soi, 0, glib.so) } 

Similarly, the configuration in [Figure 3] can be represented as follows. We have eight components 
represented by the following component identifiers: 

ci-psy2 = {Psycho,psy2, IMsk,2) 
ciJ}in2 ~ {Bin,bin2, IMsk,2) 
cijdef.psc2 — {PScr,def.psc, IMsk, 1) 
cijmy.psc2 = {PScr, my.psc, Jane, 2) 
ci-julib.so2 = {CGLib, julib.so. Jack, 1) 

ci4)sycho2 = {App, psycho, IMsk, 2) 
cijmlib.so2 — {M Lib, mlib.so, IMsk,?!) 
ci-glib.so2 — {Glib, glib. so, I M sk, 2) 

that are used in the following representation of the configuration: 

Gpsy2 = { {ci4)sy2, 0, {ciJ)in2, ci-def.psc2, cijmy.psc2, ci-julib.so2}) , 
{ciJ)in2, 0, {cijpsycho2, ci-mlib.so2, ci-glib.so2}), 
{ci-def .psc2, 0, def.psc), 
{cijmy.psc2, {ci-julib.S02}, my.psc), 
{ci_julib.S02, 0, julib.so), 
{ci4)sycho2, 0, psycho), 
{cijmlib.so2, 0, mlib.so), 
{ci-glib.so2, 0, glib. so) } 
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Now wc formally define the notion of configuration specification, that will corresponds to a CSG. For 
the sake of simplicity we shall not consider consistency constraints between different components of a 
specification. In the definition we shall use possibly infinite intervals over N. We denote with Intervals 
the set of all possible intervals, namely Intervals — Nx (N U {oo}). 

Let summation over N U {c»} extend the usual summation over N as follows: n + oo — oo + n = 
cxD + cxD — CO. Moreover, we define the following operations on intervals: summation [ni, n'l] ffl [rii, n'2] — 
[ni + n2,ni + n^, and inclusion C [77,2, 7^-2] — ni > n2 A n'l < n'2. 

Definition 5 (Component specification) A component specification is a tuple {aci, AD, AC I, [n, n']) 
where aci is the abstract identifier of the component specification, AD C ACI is the finite set of com- 
ponents upon which components specified by aci depend, ACI C ACI x Intervals is a finite set of sub- 
component specifications each quantified by an interval and such that \f (aci i,[ni,n'i\),{aci2,[n2,n2]) S 
ACI, it holds acii ^ aci2, and [ni,n'^ G Intervals. 

Given a component specification cs ~ {{t, N, O, V),AD, ACI, [n, n']), let aci{cs) — [t, N, O, V), type{cs) — 
t, dependencies{cs) = AD, and children{cs) = {aci \ (aci, [ni,n'i]) g ACI}. Moreover, given a set of 
component specifications CS, let acis{CS) be the extension oiaci{cs) to sets of component specifications 
defined in the obvious way. 

As for configurations, we require dependencies and children of a component to be disjoint. In order to 
make some of the forthcoming definitions easier, we allow dependencies to have constraints in the name, 
origin and version of components. Hence, we may have that the abstract component identifier occurring 
in a dependency specification is different from the one used in the abstract component of that type. The 
assumption in this case is that \/{t, n, 0, v) G children{cs). ^{t' , n' , o' , v' , ) € dependencies{cs) .t — t' . 

Now we define configuration specification. In the definition (and in what follows) we will assume a 
partial order on abstract component identifiers defined as follows: <aci be the least partial order on 
abstract component identifiers such that {ti,Ni, Oi, Vi) <aci (^2, -^2, O2, V2) if and only if ti = t2, Ni C 
N2,Oi C O2 and Vi C V2. 

Definition 6 (Configuration specification) A configuration specification is a set CS of component 
specifications such that: 

1. \/csi,cs2 G CS it holds type{csi) ^ type{cs2); 

2. Wcs e CS it holds children{cs) C acis{CS); 

3. Wcs £ CS it holds \/aci G dependencies{cs) .3aci' G acis{CS) s.t. aci <aci aci' ; 

4. 3!cs G CS s.t. Vcs' G CS it holds aci{cs) ^ children{cs'); 

5. Vcs G C either J^cs' s.t. aci{cs) G children{cs') or 3\cs' s.t. aci{cs) G children{cs') 

6. \/{aci,AD,ACI, [ni,n'i\) G CS it holds S{[n2,n'2] \ {aci, [^2,713]) G ACI} C [ni,n'i\. 

The CSG depicted in [Figure I] can be represented as follows. We have the following seven abstract 
component identifiers: 

aci.Psycho = {Psycho, {psyN | G N}, {IMsk}, N} 
aci.Bin = {Bin, {binN | A G N}, {IMsk},N} 
aci_PScr = {PScr, Names, Origins, N} 
aciJCCLib — {CGLib, Names, Origins, N} 

aci_App = {App, {psycho}, {IMsk}, N} 
aci.MLib = {MLib, {mlib.so}, {IMsk},N} 
aci.GLib = {GLib, {glib.so}, {IMsk}, N} 
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that are used in the following representation of the configuration specification: 

CSpsycho = { {aci-Psycho, 0, {{aci-Bin, [1, 1]), {ad-PScr, [1, oo]), {aciJCGLib, [0, oo])}, [2, oo]), 
{aci.Bin, 0, {{aci^pp, [1, 1]), {aciJALih, [1, 1]), {acijGLib, [1, 1])}, [3, 3]), 
{aci.PScr, {aciJ^Scr, aciJOGLib}, 0, [0, 0]), 
{aci_CGLib, 0, 0, [0, 0]), 
{aci.App,0,0, [0,0]), 
{aci.MLib,0,0, [0,0]), 
(ad_GLi6,0,0, [0,0]) } 

4.2 A type discipline for configuration compliance checking 

In this section we formalize the notion of compliance of a configuration with a certain specification. A 
configuration specification can be seen as a definition of a type for configurations, and the compliance of 
a configuration with respect a certain specification can be seen as a type checking process. The way in 
which we perform the type checking is based on a type inference relation and on a subtyping relation. The 
type inference will allow us to compute a minimal specification satisfied by a given configuration. The 
subtyping relation will be used to compare the inferred minimal specification with the given one. If the 
subtyping relation is defined for such two specifications, then the considered configuration is compliant 
with the given specification. 

The type inference relation is based on a notion of unification of configuration specifications. Given a 
component identifier [t, n, o, v), let ci2aci{{t, n, o, v)) = {t, {n}, {o}, {v}) and ci2aci{CI) be its extension 
to sets of component identifiers defined in the obvious way. Moreover, given two abstract component 
identifiers {t,Ni,Oi,Vi) and (i, TVa, O2, 1^2), let {t,Ni,Oi,Vi)(B{t,N2,02,V2) = {t,NiUN2,OiU02,ViU 
V2). 

Definition 7 (Unification) The unification operation It!) on configuration specifications is defined as 
follows: 

CSi lyj CS2 = {cs I cs e CSiA ^cs' e CS2.type{cs) = type{cs')} 
U {cs I cs G CS2/\ ^cs' G CSi.type{cs) = type{cs')} 
U {{acii e ad2, ADl l±l AD2, ACh ^ ACh, [ni + 712, n'^ + n^) 
I CSi = {acii, ADi, ACIi, [n^, n-]) G CSi A type{cs\) = type{cs2)} 

where the unification of abstract component dependencies is defined as 

ADl ^ AD2 = {acii I acii G ADiA ^aci2 G AD2-type{acii) = type{aci2)} 
U {aci2 I aci2 G AD2A j&acii G AD\.type{aci\) = type{aci2)} 
U {acii ® aci2 \ acii € ADi A type{acii) = type{aci2)} 

and the unification of abstract component children is defined as 

ACh WACI2 = {{acii, [0,n'i]) \ {acii, [ni,n'i]) G AChA 

^aci2, n2, n'2-{{aci2, [n2, ^2]) G ACI2 A type{acii) 

U {{aci2, [0,722]) I (aci2, [n2,n2]) G ACI2A 

^acii,ni,n'i.({acii, [ni,n'i]) G ACh Atype{acii) 

U {{aci\ © aci2, [min{nx,n2), ■max{n\' , n2')]) 

I {acii, [ni,n'j\) G ACh A type{acii) = type{aci2)} 

The unification operation takes two configuration specifications, which are sets of component specifi- 
cations, and results into a single set of component specifications. 

If a component specification occurs in one configuration and no components of the same type occurs 
in the other, then the component specification is part of the configuration unification. If components of 



= type{aci2))} 
= type{aci2))} 
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the same type occur in both configurations, a new coinponcnit is built in the unification as follows: a) 
the name of the new component is a merge of the two component names, b) the dependences are the 
union of the dependences of the two components, c) the subcomponents of the new component result 
from the unification of the subcomponents, and d) the range of occurrences of the new component in the 
configuration unification is derived by the intervals of the component specifications. 

The unification of abstract component dependencies simply makes the unions of the two sets by merging 
elements of the same type. 

The unification of abstract component children proceeds analogously. When two children have the 
same type, the interval describing the possible number of components of that type, in the unification, 
must consider the minimum and the maximum between them. 

Proposition 1 Given two configuration specifications CSi and CS2, CS1WCS2 is a configuration spec- 
ification. 

Now we define a type inference relation that gives the minimal specification satisfied by a given 
configuration. 

Definition 8 (Type Inference) The type inference relation h C : CS is the least relation on configu- 
rations and configuration specifications satisfying the following three rules: 

h {{ci, D, E)} : {{ci2aci{ci), ci2aci{D), 0, [1, 1])} 

ACIs = {{®CIt, [\CIt\, \CIt\]) I CIt g CI/^J 
h {{ci,D,CI)} : {{ci2aci{ci),ci2aci{D),ACIs, [\CI\, \CI\])} 

\~ Ci : CSi \~ C2 '■ CS2 
h Ci U C2 : CSi IM) CS2 

where CI/=^ is the quotient set of CI with respect to =1, namely a partition ofCI corresponding to the 
set of all equivalence classes of=t, and where =t is the least equivalence on component identifiers such 
that (ti, ni, oi, wi) =t (^2 , 't-2 , 02 , "2 ) if and only if ti =^2- 

The first rule describes the configuration specification of a component without subcomponents. The 
second rule describes the specification of a component with subcomponents: all the subcomponents with 
the same type are described by a single component specification. The third rule simply says that the 
specification of a union of configurations is the unification of the specifications. 

Let us consider again our running example Psycho. It is easy to see that for configurations Cpgyi and 
Cpsy2, both h Cpsyi ■ CSpsyi and h Cpsy2 '■ CSpsy2 hold. Let us show only CSpsy2 (since CSpsyi is 
simpler), that is as follows 

CSpsy2 = { 

{acijpsy2, 0, {{aciJnn2, [1, 1]), {acijdef.psc2 ® acijmy.psc2, [2, 2]), {aci-julib.so2, [1, 1])}, [4, 4]), 

{aciJbin2, 0, {{acijpsycho2, [1, 1]), {acijmlib.so2, [1, 1]), (aci-glib.so2, [1, 1])}, [3, 3]), 

{aci_def.psc2 © acijmy.psc2, {aci-julib.so2}, 0, [0, 0]), 

(aci_julib.S02, 0, 0, [0, 0]), 

{aci-psycho2, 0, 0, [0, 0]), 

{acijmlib.so2, 0, 0, [0, 0]), 

{aci-glib.so2, 0, 0, [0, 0]) 

} 

in which aci-X stands for ci2aci{cijx;) for any x. 

Proposition 2 Given a configuration C, there exists a unique configuration specification CS such that 
hC:CS. 

Now we define a subtyping relation on configuration specifications. 
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Definition 9 (Subtyping) Let <cs be the least partial order on component specifications such that 
{acii, ADi, ACIi, [ni,n'i\) <cs {aci2, AD2, ACI2, [n2,n'2\) if and only if the following conditions hold: 

• acii <aci aci2; 

• for all aci' e ADi there exists aci" € AD2 such that aci' <aci aci" ; 

• for all (aci', [n", n"']) G ACIi there exists (aci". [nj,', 71.2"]) G ACI2 such that aci' <aci aci" and 
K,<']CK,<]; 

• [ni,n[] C [n2,ny. 

The subtyping relation <cs on configuration specifications is the least partial order such that CSi <cs 
CS2 holds if and only if for all csi £ CSi there exists CS2 G CS2 such that csi <cs CS2- 

The subtyping relation essentialy checks that every component of the smaller configuration specification 
is contained also in the bigger configuration specification, possibly with weaker constraints. 

Definition 10 (Configuration compliant with a specification) A configuration C is compliant with 
a configuration specification CS if and only if there exists CS' such that h C : CS' and CS' <cs CS. 

If we consider again our running example Psycho, it is easy to see that, as expected, both configurations 
Cpsyi and Cpsy2 are compliant with specification CSpsycho- In fact, as already said both h Cpsyi : CSpsyi 
and h Cpsy2 : CSpsy2 hold. Moreover, both CSpsyi <cs CSpsycho and CSpsy2 <cs CSpsycho hold. 

A notion of compatibility between configurations can be defined in a way that is similar to the definition 
of subtyping between specifications. In this definition we assume backward compatibility for individual 
components. This means that a component associated with a certain version number is assumed to be 
compatible with previous versions of the same component. 

Definition 11 (Compatible configurations) Let \—ci he the least partial order on component iden- 
tifiers such that (ti, rii, oi, ui) Cci (^2, '^■2, 02, W2) holds if and only if ti = t2,ni = 712,01 = 02 and 

Vi <V2. 

Let Qc be the least partial order on configurations such that for all ci £ Ci there exists C2 £ C2 such 
that ci(ci) Qci ci(c2). 

Given a configuration specification CS, a configuration C2 is compatible with configuration Ci if and 
only if both Ci and C2 are compliant with CS, and Ci Qc ^2- 

In the Psycho example we have that configuration Cpsy2 as expected is compatible with configuration 
Cpsyi, since both configurations are compliant with CSpsycho and Cpsyi E Cpsy2 holds. 

5 Conclusions 

We have presented a model for software configuration management based on a UML notation and with a 
formal definition. UML is a well-established language to descrive software system architectures, hence it 
naturally fits the domain of configuration management. The formal definition gives a solid mathematical 
foundation to the model and allows us to define a type discipline to check that a configuration is compliant 
with its specification. 

A preliminary version [Mor06[ ICM06] of the model has been applied to several case studies of complex 
software systems which are representative of the scenarios described in [subsection 2.l[ These case studies 
include the Spybot anti-spy ware utility (as an example of systems that need very frequent updates), the 
Cygwin unix-like environment for Windows (as an example of systems with huge and complex configu- 
ration trees) and the Racer multiplayer drive simulation game (as an example of systems that include 
lot of user- provided components) . 

The diffusion of open source software and the consequent possibility for many programmers to partici- 
pate in the development of software systems makes the management of configurations a challenging issue. 
Moreover, open source software is often installed and managed by users themselves. In this context the 
need of flexible and reliable mechanisms for configuration management arises. Current solutions, such as 
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those used to manage limix distrilnitions, arc very centralized: users have to fohow the; cvohition of the 
software components supported by the official repositories. In addition, they generally lack of guarantees 
of reliability based on some formal theory. We believe that the proposed model fulfills both the need of 
flexibility and reliability. 

Future work might include the definition of a digital format for the representation of specifications and 
configuration, for instance based on XML/XML The development of a managment library supporting 
operations such as compliance checking, compatibility checking, installation, uninstallation, update, and 
so on, can be based on such a definition. 
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